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ABSTRACT 

The  analysis  of  jury  size  and  jury  verdicts  in  criminal 
matters  now  has  a  long,  though  interrupted  history.  Work  on 
this  subject  in  the  18th  and  19th  centuries  by  Condorcet  and 
Laplace  is  discussed  and  the  Poisson  model  of  the  1830's  is 
highlighted.  The  latter  is  modified  to  analyze  the  American 
jury  experience  of  the  20th  century.  Recent  U.S.  Supreme 
Court  decisions  in  the  1970's  on  jury  size  and  jury  decision¬ 
making  have  created  a  resurgence  of  interest  especially  on  a 
comparison  of  six  member  and  twelve  member  juries.  Some  com¬ 
parisons  of  size  in  terms  of  probabilities  of  errors  in  verdicts 
are  presented. 
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♦Based  on  an  Invited  Talk  sponsored  by  the  American  Statistical 
Association,  the  Office  of  Naval  Research,  the  French  Embassy 
and  George  Washington  University  given  in  Washington,  D.C.  at 
a  Conference  celebrating  the  200th  anniversary  of  the  birth  of 
Simeon  D.  Poisson. 


INTRODUCTION 


Studies  looking  into  the  association  of  jury  size  and  jury 
verdicts  appear  frequently  these  days.  Articles  analyzing  some 


aspects  of  this  subject  can  be  found  in  journals  and  books  produced 
for  legal  scholars,  social  psychologists,  political  scientists, 
statisticians  and  others.  Several  articles  are  essentially  reviews 
or  surveys  of  existing  literature  even  though  the  time  span  for 
published  articles  in  this  field  is  relatively  short,  see  for 
example  Penrod  and  Hastle  (1979).  In  a  report  to  the  Federal 
Judicial  Center,  Saks  (1981)  reviews  and  analyzes  a  large  litera¬ 
ture  on  small  group  research  and  the  application  of  the  results  to 
American  jury  behavior.  Sometimes,  one  jury  size  model  is  sub¬ 
jected  to  an  extensive  critique  in  terms  of  its  relevance  and 
approximation  to  reality  as  in  Kaye  (1980). 

A  series  of  U.  S.  Supreme  Court  decisions  on  jury  size  and 
jury  decision  rules  in  criminal  cases  beginning  in  1970  with 
Williams  v.  Florida  were  the  motivation  for  this  burgeoning  indus¬ 
try.  Other  decisions  soon  followed;  in  1972  (Johnson  v.  Louisiana, 
Apodaca  v.  Oregon),  in  1978  (Ballew  v.  Georgia)  and  in  1979  (Burch 
v.  Louisiana).  Williams  permitted  six  jurors  in  state  felony 
trials  (reserving  twelve  for  federal  felony  trials);  the  next  two 
decisions  permitted  jury  verdicts  based  on  nine  out  of  twelve  and 
ten  out  of  twelve  majorities  in  state  felony  trials,  but  then 
Ballew  ruled  unconstitutional  a  jury  of  size  five  in  a  state  felony 
trial;  and  Burch  ruled  unconsitutional  a  five  out  of  six  majority 
decision  in  a  state  felony  trial. 

Thus,  in  the  decade  spanning  the  1970’s  important  decisions 
were  rendered  on  jury  size  and  majority  requirements  for  decisions. 
For  the  first  time,  in  a  very  public  way,  twelve-member  juries  and 
unanimity  were  no  longer  sacrosanct  in  felony  trials.  Smaller 
juries  and  majority  verdicts  had  been  in  existence  a  long  time  but 
they  had  not  been  challenged.  Some  social  science  writers  have 
referred  to  the  evolving  Supreme  Court  position  on  jury  size  as 
being  on  a  'slippery  slope'.  More  studies  and  probably  more 
Supreme  Court  decisions  will  follow.  The  closest  precursor  to  this 
kind  of  scholarly,  legislative,  and  judicial  activity  took  place 
in  France  at  the  turn  of  the  19th  century  and  continued  for  almost 
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50  years.  The  French  school  of  mathematicians  engaged  in  proba¬ 
bility  theory  examined  jury  size  and  the  jury  as  a  decision  mak¬ 
ing  body  both  from  a  theoretical  and  an  empirical  point  of  view. 
Among  them  were  eminent  savants  such  as  Condorcet ,  Laplace, 

Poisson  and  Cournot. 

Today  there  is  an  extensive  literature  published  in  the  last 
decade  reporting  on  empirical  studies  of  jury  size  and  jury  de¬ 
cision  making.  A  prolific  contributor  along  these  lines  is  Davis 
and  his  colleagues  and  students  at  the  University  of  Illinois 
(1973,  1975,  1977,  1978).  In  this  article  we  will  touch  on  these 
studies  but  emphasis  will  be  given  to  probabilistic  models  of  jury 
size  and  jury  behavior.  The  number  of  Investigators  in  this  sub¬ 
ject  is  much  smaller  than  those  engaged  in  empirical  efforts. 

Those  of  us  who  engage  in  probabilistic  models  owe  a  debt  to 
S.  D.  Poisson  and  his  pioneering  work  on  this  topic  published  in 
his  1837  book  "Recherches  sur  la  probability  des  jugements  en 
matlfere  crlmlnelle  et  en  matifere  civile”.  This  work  contains  a 
detailed  and  somewhat  discursive  exposition  of  a  jury  behavior 
model  motivated  and  supported  by  data  on  jury  trials  and  verdicts 
in  France  in  the  period  1825-1833.  Of  special  Interest  to  Poisson 
was  the  calculation  of  probabilities  of  the  two  kinds  of  errors 
possible  in  jury  verdicts,  namely,  the  probability  of  convicting 
an  innocent  person  and  the  probability  of  acquitting  a  guilty 
person.  The  U.  S.  Supreme  Court  is  somewhat  remiss  in  its  decisions 
in  ignoring  these  errors.  They  could  be  quite  difficult  to  quan¬ 
tify  in  the  American  legal  experience  but  some  recognition  of  this 
problem  could  have  been  demonstrated. 

In  Williams  v.  Florida,  the  Court  discusses  unconditional  prob¬ 
ability  of  conviction  for  juries  of  sizes  six  and  twelve,  and 
asserts  that  these  probabilities  do  not  differ  in  any  operational 
sense.  A  number  of  empirical  studies  of  small  group  and  jury 
behavior  are  referenced  in  Supreme  Court  decisions  but  studies  of 
probabilistic  models  for  jury  behavior  do  not  appear  except  in  the 
Balleu  decision.  The  probalistic  model  discussed  in  that  opinion 
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was  subsequently  shown  to  be  weak  and  unwise.  It  may  be  that  the 
results  of  such  models  are  too  precarious  or  unreliable  to  serve 
as  components  for  Supreme  Court  decisions  or  that  the  law  clerks 
and  justices  are  uncomfortable  or  unfamiliar  with  that  kind  of 
thinking . 

The  French  School:  Condorcet  and  Laplace 


Questions  of  this  kind  did  perplex  the  French  probablliats. 
For  example,  in  considering  the  judgments  of  juries  or  tribunals, 
Laplace  (1820)  refers  to  the  following  risk  principle:  'the  proof 
of  the  crime  of  the  accused  ought  to  have  a  high  degree  of  proba¬ 
bility  that  the  citizens  have  less  dread  of  errors  In  Judgment,  if 
the  accused  be  Innocent  and  condemned,  than  of  his  new  attempts 
and  those  of  the  unhappy  ones  whom  the  example  of  his  impunity  en¬ 
courages,  If  he  was  guilty  and  absolved'. 

This  principle  Is  attributed  to  Condorcet  by  Karl  Pearson 
(1978).  In  his  pioneering  work  on  juries  and  testimony,  Condorcet 
(1785)  gives  quite  a  bit  of  attention  to  the  two  kinds  of  errors 
Inherent  in  a  judicial  decision.  The  discussion  of  conviction  of 
an  innocent  defendant  is  given  on  pages  123-127  and  acquitting  the 
guilty  on  pages  233-241  In  Condorcet *s  treatise.  Condorcet  would 
like  the  probabilities  of  these  two  kinds  of  errors  to  be  quite 
small  and  he  provides  some  development  of  how  he  would  determine  a 
probability  value  to  be  small.  This  also  leads  him  to  be  an  advo¬ 
cate  for  the  abolition  of  capital  punishment.  Since  capital  pun¬ 
ishment  cannot  be  reversed,  Condorcet  feels  that  even  though  the 
probability  of  convicting  an  Innocent  defendant  may  be  quite  small 
over  a  large  number  of  cases,  the  probability  of  at  least  one  in¬ 
nocent  going  to  his  or  her  death  can  be  quite  large.  Condorcet's 
work  on  juries  was  motivated  by  questions  on  probabilities  of  ju¬ 
dicial  error  and  interestingly  he  was  encouraged  and  supported  in 
this  work  by  Turgot,  Controller  General,  one  of  Louis  XVI 's  most 
powerful  ministers. 


Laplace  asserts  however  that  the  probabilities  of  the  two 
kinds  of  judicial  errors  are  very  difficult  to  determine.  In  con¬ 
sidering  this  problem  of  errors  in  judicial  decisions,  Laplace 
offers  implicitly  the  following  calculations.  Let  ft  be  the  perso¬ 
nal  probability  offered  by  a  judge  or  juror  after  evaluating  the 
evidence  that  a  defendant  is  guilty.  In  doing  this,  the  judge 
recognises  that  one  of  the  two  kinds  of  errors  may  occur:  erro¬ 
neous  convictions  or  erroneous  acquittals.  For  either  one,  there 
is  a  cost  to  society  or  to  the  individual.  Let  us  assume  the  loss 
due  to  erroneous  conviction  is  L^,  similarly  L^  for  erroneous  ac¬ 
quittal.  These  are  values  that  can  and  do  vary  from  society  to 
society  and  from  crime  to  crime.  However,  given  these  values, 
Laplace  continues  implicitly  that  a  judge  would  prefer  a  value  of 
ft  such  that 


ftLA  >  (l-ft)Lc  , 


that  is,  the  expected  loss  due  to  an  acquittal  exceeds  the  expec¬ 
ted  loss  due  to  a  conviction.  This  leads  to 


ft  > 


-  VLc 


a 


and  therefore  a  judge  would  convict  when  ft  _>  a  (note  0  <_  ft  1). 

Two  judges  can  each  have  the  same  standard  for  probability  of 
guilt,  namely  ft  ^  a,  but  of  course  they  can  differ  through  legal 
acumen  as  to  how  well  they  do  in  relative  frequency  of  correct  ver 
diets.  Thus  judges  operating  within  this  personal  probability 
structure  will  have  some  objective  frequency  of  success.  Let  x  be 
the  relative  frequency  of  success  In  verdicts  rendered  by  a  judge. 
Laplace  assumes  1/2  £  x  £  1. 

If  we  have  n  judges  or  a  jury  of  n  members  of  whom  n-i  con¬ 
vict  and  i  acquit  the  defendant,  Laplace  asserts  that  the  probabi¬ 
lity  that  the  decision  is  just  will  be  proportional  to 

n-i,,  .  i 

x  (1-x)  , 


likewise  the  probability  that  the  opinion  of  the  jury  is  not  Just 


will  be  proportional  to 


(1-x)""1*1  . 

Thus  the  probability  of  validity  of  the  Judgment  of  the  jury  is 


xn"i(l-x)1  +  (l-x^'Sc1 


In  doint  this  Laplace  is  assuming  that  all  the  jurors  are  using 
the  same  threshold  value  for  x.  Laplace  also  assumes  that  the 
values  of  x  are  a  priori  equally  likely  to  have  any  value  between 
zero  and  one,  but  that  x  for  the  jurors  will  never  be  less  than 
1/2.  He  further  states  that  for  any  observed  decision,  that  the 
jury  is  divided  into  two  parts;  n-i  jurors  vote  to  convict  the  de¬ 
fendant  and  i  jurors  vote  to  acquit  and  thus  the  relative  frequen¬ 
cy  of  the  observed  event  is  proportional  to 

xn_1(l-x)1  +  (l-x)n“i(x)i  . 
o™  A  A 

From  before  we  have  x  (1-x)  is  proportional  to  the  probability 
that  the  verdict  is  just  and  (l-x)**  Sc*  is  proportional  to  the 
probability  that  the  verdict  is  not  just.  Inplace  has  added  these 
together  and  states  the  probability  of  n-i  jurors  voting  >r  con¬ 
viction  and  1  jurors  voting  for  acquittal  is  proportional  to  the 
sum  of  the  two  terms. 

Each  of  these  sums  should  be  multiplied  by  the  probability 
that  the  defendant  is  guilty  and  not  guilty  respectively.  In  the 
Poisson  model  that  we  discuss  shortly  these  are  taken  Into  account 
explicitly  as  is  the  probability  that  a  juror  will  not  make  an 
error.  Laplace  seems  to  assume  that  the  probability  of  guilt  and 
innocence,  a  priori,  are  1/2  each.  At  the  end  of  his  analysis, 
Laplace  defines  P,  the  probability  of  a  just  verdict  as 
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Note  that  he  Integrates  x  from  1/2  to  1,  and  that 


1  xn-i(l-x)1dx 
0 


[  xn~*(l-x) *dx  + 

Jl/2 


x'^a-x*)^ 


where  x'  ■  1-x.  Note  also  that  the  combinatorial  coefficient  fac¬ 
tor  (°)  is  missing  but  for  P  it  cancels  out. 

There  is  obviously  some  model  Inadequacy  in  this  development 
and  the  work  of  Poisson  and  others  will  also  bear  similar  frailties. 
Among  other  things,  the  model  assumes  that  x  is  the  same  for  each 
member  of  a  jury  or  a  panel  of  judges.  It  also  assumes  indepen¬ 
dence  of  decision  by  the  n  jurors  or  judges.  Jioreover,  the  dis¬ 
tinction  in  judicial  decision  making  between  a  panel  of  judges  and 
a  jury  merits  additional  thought. 

In  France,  at  one  point  during  Laplace's  time,  there  were 
eight  judges  (jurors)  and  five  determined  a  verdict.  Essentially 
an  initial  ballot  determined  the  outcome.  This  was  true  also 
shortly  afterwards  when  seven  jurors  out  ot  twelve  (1825-30)  and 
then  eight  jurors  out  of  twelve  (1831-33)  determined  the  outcome. 
This  is  in  contradistinction  to  where  unanimity  or  something  close 
to  it  is  required  as  in  the  United  States. 

For  the  case  of  eight  jurors  and  verdict  by  a  vote  of  exactly 
five  out  of  eight,  the  probability  of  an  incorrect  judgment  is 


1-P 


x5(l-x)3dx 


x5(l-x) 3dx 


From  this  equation  we  find  1-P  ■  .2539,  or  a  majority  of  one  judge 
in  a  group  of  eight,  under  Laplace's  model,  will  lead  to  an  incor¬ 
rect  decision  in  roughly  one  out  of  four  cases.  In  other  words,  we 
can  add  that  the  defendant's  risk  of  being  unjustly  convicted  or 
the  risk  of  criminals  escaping  punishment  would  both  be  rather  high. 
Of  course,  this  assumes  that  x  _>  1/2  is  realistic  and  that  Laplace's 
model  is  valid.  If  we  now  sharpen  the  judge’s  evaluation  powers 


and  assume  x  >  4/5,  we  find  that 


1-P  -  0.0856 


and  then  the  choice  of  an  incorrect  verdict  is  reduced  to  one  out 
of  twelve.  This  is  closer  to  values  we  obtain  subsequently  from 
the  Poisson  model  in  early  19th  century  France  and  its  modification 
and  use  in  mid  twentieth  century  America. 

On  the  other  hand  if  we  still  consider  x  _>  1/2,  require  a 
jury  of  size  12  and  unanimity,  we  find 


1-P 


1/2 


12. 


i 


12 


x  dx/ '  x  dx  -  0.0001221 


or  only  one  error  in  8192  cases.  Poisson  discusses  this  Laplace 
result  and  shows  that  the  probabilities  of  errors  in  convictions 
is  14/8192,  92/8192,  378/8192,  1093/8192,  2380/8192  when  convic¬ 
tions  are  voted  by  11  to  1.  10  to  2 ,  9  to  3,  8  to  4,  and  7  to  5 

respectively.  Thus  with  the  smallest  majority,  the  probability  of 
error  is  approximately  2/7,  so  that  out  of  a  very  large  number  of 
accused  convicted  by  a  7  to  5  majority,  approximately  2/7  should 
not  have  been.  For  a  majority  conviction  by  8  to  4,  nearly  1/8  of 
the  convictions  could  be  in  error.  Actually,  Laplace  suggests 
that  the  decision  rule  should  be  at  least  9  out  of  12. 

Poisson  stresses  that  these  results  from  the  Laplace  analysis 
assume  the  probability  of  guilt  before  trial  is  1/2,  an  assumption 
he  considers  unrealistic  and  that  the  equation  for  P  should  read 


P  - 


0  [  xn“i(l-x)idx 

h/2 


(i  n  ,  , fi/2  w  ~ 

9  xn_1(l-x)1dx  +  (1-6)  j  xn'i(l-x)1dx 
'1/2  '0 


where  9  is  the  probability  of  guilt  before  the  accused  is  brought 
to  trial.  Once  again  the  binomial  factor  (^)  is  not  written  be¬ 
cause  it  cancels  in  the  equation  and,  of  course,  9  -  1/2  gives  the 
1  place  re'  It. 
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Poisson  also  notes  that  the  Laplace  derivation  assumes  that 
the  likelihood  of  a  juror  not  making  an  error  is  the  same  for  all 
jurors  (or  the  mean  of  a  distribution  of  values  over  jurors)  an 
assumption  both  Poisson  and  we  will  also  grant  for  a  specific  ven¬ 
ire  but  where  the  mean  can  vary  for  different  venire.  In  addition, 
however  the  Laplace  structure  includes  nothing  that  depends  on  the 
abilities  of  the  jurors  who  render  the  verdict  except  implicitly 
through  their  estimate  of  the  threshold  value  for  a. 

In  short,  we  see  that  the  Laplace  analysis  and  its  conclusions 
depend  only  on  jury  sire  and  the  majority  producing  the  verdict, 
while  Poisson  asserts  that  8,  the  probability  that  a  defendant  is 
guilty  before  the  evidence  is  presented,  and  y,  the  probability 
that  a  juror  will  not  make  an  error  are  two  additional  parameters 
to  consider  in  jury  analyses.  The  value  of  8  is  a  reflection  of 
the  society  in  which  jury  decisions  are  made.  Poisson  is  quite 
sensitive  to  the  fact  that  in  a  tranquil  society,  8  >  1/2  but  that, 
say  during  the  French  revolution,  8  could  be  quite  smaller  than 
1/2.  The  value  of  V-  should  depend  on  the  characteristics  of  the 
venire  from  which  a  juror  is  drawn.  Poisson  desires  that  y  >  1/2 
just  as  Laplace  required  x  >  1/2  for  each  judge  in  his  model.  At 
any  rate,  the  computation  of  probabilities  of  incorrect  verdicts 
should  be  based  on  these  values. 

Poisson  Jury  Model 

Let  us  now  look  into  the  Poisson  model  in  some  detail.  It  is 
important  to  note  that  Poisson  in  developing  this  model  paid  heed 
to  the  data  available  in  his  day.  For  the  period  1825-30,  jury  de¬ 
cisions  were  based  on  seven  or  more  out  of  twelve  jurors  favoring 
either  conviction  or  acquittal.  Cases  with  verdicts  of  exactly 
seven  out  of  twelve  vent  to  a  higher  court  which  could  change  the 
verdict.  For  each  year,  the  number  of  trials  and  number  of  con¬ 
victions  were  listed  for  crimes  against  persons  and  crimes  against 
property.  In  the  period  1831-33,  listings  were  also  available  ex¬ 
cept  the  majority  required  was  eight  or  more  out  of  twelve.  In 
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1832  and  1833,  the  jury  could  find  extenuating  circumstances  in  a 
conviction  that  would  lead  to  a  lighter  penalty. 

What  impressed  Poisson  was  the  stability  of  the  conviction  ra¬ 
tios  over  each  of  the  years  1825-1830  and  1832-1833.  He  felt  this 
was  a  basis  for  developing  a  model  that  in  some  parsimonious  way 
could  reproduce  the  data,  and  if  so,  lead  to  the  computation  of  the 
probabilities  of  the  two  kinds  of  errors  important  in  judging  the 
effects  of  size  and  decision  making  of  a  jury,  namely,  the  probabi¬ 
lity  of  acquitting  a  guilty  defendant.  Tables  1  and  2  are  taken 
from  Poisson's  work  and  show  the  stability  of  conviction  ratios 
noted  by  him.  Note  that  in  1832,  the  conviction  ratio  (.5388)  is 
somewhat  less  than  in  1832  and  1833  (.5890)  even  though  8  or  more 
out  of  12  are  required  for  a  conviction;  extenuating  circumstances 
leading  to  reduced  sentences  are  permitted  in  1832  and  1833,  thus 
possibly  serving  as  a  factor  to  increase  the  conviction  ratio. 

These  conviction  ratios  are,  of  course,  smaller  than  for  the  years 
1825-1830. 

To  check  on  the  homogeneity  of  the  annual  proportions  of  con¬ 
viction  over  the  years  1825-1830,  Poisson  divided  the  six  years 
into  two  groups,  1825-1827  and  1828-1830  and  tested  the  difference 
of  the  proportion  of  conviction  in  each  period  employing  the  normal 

approximation  to  the  binomial.  He  concluded  there  was  no  differ- 
2 

ence.  Since  the  X  goodness  of  fit  test  over  the  six  years  is  now 

available  in  our  statistical  armory,  the  homogeneity  hypothesis 

2 

was  tested  in  this  manner  employing  a  x  with  five  degrees  of  free- 

2 

dom.  We  computed  X5  “  14.04  from  the  data  in  Taole  1.  Thus  homo¬ 
geneity  is  rejected  at  the  .05  level  of  significance  but  accepted 
at  the  .01  level  of  significance.  The  principal  contribution  to 
statistical  significance  comes  in  1830  and  Poisson,  in  his  work, 

remarks  that  possibly  the  proportion  of  convictions  in  that  year 

2 

may  be  a  little  out  of  line.  If  we  omit  1830,  we  compute  x^  *  4.85 
which  indicates  no  significance  at  the  .05  level  and  thus  homoge¬ 
neity  over  the  five  years  1825-1829. 
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TABLE  II 


Number  of  Accused,  Jury  Decisions,  and  Estimates  of  I^2  ^  by  Year 
in  the  Years  1831-1833  In  France 


Statistic 

Crimes 

against 

1831 

1832 

1833 

1832  and 

1833 

(1) 

Number  of  accused 

7606 

7555 

6964 

14,519 

(2) 

Number  of  accused 

Person 

2046 

— 

— 

4,108 

(3) 

Number  of  accused 

Property 

5560 

— 

— 

10,421 

(4) 

Number  of  convicted 

4098 

4448 

4105 

8,553 

(5) 

Number  of  convicted 

Person 

743 

— 

— 

1,889 

(6) 

Number  of  convicted 

Property 

3355 

— 

— 

6,665 

Estimates  of  T 

12,4 

(7) 

Conviction  ratios 

Total 

.5388 

.5887 

.5895 

.5890 

(8) 

Conviction  ratios 

Person 

.3631 

— 

.4598 

(9) 

Conviction  ratios 

Property 

.6034 

— 

— 

.6395 

From  our  previous  discussion  of  Poisson's  criticism  of 

Laplace,  we  are  aware  of  his  concern  to  include  two  parameters;  0, 

the  probability  that  the  accused  is  guilty  before  the  evidence  is 

presented  to  the  jury  and  y,  the  probability  that  a  juror  will  not 

make  an  error.  The  first  parameter  is  a  commentary  on  the  society 

and  its  law  enforcement  procedures  and  the  second  relates  to  how 

well  a  selected  juror  can  sift  through  and  assess  evidence.  We  now 

list  6,  y,  and  the  following  definitions  to  develop  the  model.  P  : 

c 

probability  of  a  conviction,  P^:  probability  of  an  acquittal, 
probability  of  guilt  given  an  acquittal,  P^^,:  probability  of  inno¬ 
cence  given  a  conviction.  For  an  attempt  of  a  model  employing  only 
one  parameter,  the  reader  is  referred  to  Walbert  (1971). 

Subsequently  when  we  modify  the  model  to  make  it  more  appro¬ 
priate  for  the  American  experience  we  will  add  P  :  probability  of  a 

fl 

hung  jury,  and  instead  of  y  employ  y^:  probability  that  a  juror  will 
vote  guilty  given  the  accused  is  guilty  and  u2;  probability  that  a 
juror  will  vote  for  acquittal  given  the  accused  is  innocent.  A  word 


about  and  P^c  is  111  order .  By  guilt  we  mean  'convictable" 

and  by  Innocence  we  mean  'nonconvictable'  on  the  basis  of  the  evi¬ 
dence.  Only  some  higher  being  (sometimes  not  even  the  defendant) 
can  know  the  true  situation.  Empirically  the  decision  of  a  judge 
can  be  and  is  taken  as  the  anchor  and  compared  with  jury  decisions 
to  estimate  these  errors  and  we  will  look  into  this  later  to  com¬ 
pare  the  results  with  values  obtained  from  our  models  and  models  of 
Poisson. 

Since  in  Poisson's  day  the  majority  required  for  decision  was 
first  seven  out  of  twelve  and  then  eight  out  of  twelve,  essentially 
an  initial  ballot  could  suffice.  Thus  the  probability  of  convic¬ 
tion,  P  ,  is  the  probability  that  say  i  jurors  vote  for  acquittal 

V* 

where  i  <  5  or  i  <  4.  We  can  determine  the  probability  that  i  jur- 
rors  out  of  n  vote  for  acquittal  in  the  following  way.  Assume 
n  *  1,  then  write 

PC  "  PCG  +  PCG 

where  PCG  is  the  joint  probability  of  conviction  and  guilt  and  Pcg 
is  the  joint  probability  of  conviction  and  innocence.  Also 

PC  "  PC/GPG  +  PC/GPG 

where  P^G  is  conditional  probability  of  conviction  given  guilt  and 
P cj~  is  conditional  probability  of  conviction  given  innocence.  But 


V,  Pc/g  *  1-U,  hence 


and  since  P, 


P0U  +  Pg(l-y) 


6y  +  (1-9) (1-y) 


6(l-y)  +  (l-S)y 


but  PA  is  the  probability  that  a  juror  will  vote  for  acquittal. 

Let  y  .  be  the  probability  that  exactly  i  jurors  out  of  n  vote  for 
n  9  i 

acquittal;  then  Yj^  1  -  PA-  If  n  «  2,  and  y2  ±  is  probability 
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exactly  1  jurors  out  of  two  vote  for  acquittal  we  have 

Y2,0  "  0V*2  +  U-0)^ 

Y2,l  “  20y(1“v)  +  2d-0)(l-y) 
y2>2  -  6(l-u)2  +  (1-6) y2  . 

Thus  for  a  jury  of  size  n,  we  get 


Yn,i  "  +  (l-6)yi(l-y)n"i]  . 


Note  that  the  two  terms  In  the  brackets  are  respectively  the  'guilty* 
component  where  the  1  votes  for  acquittal  are  in  error  and  the  'not- 
guilty'  component  where  the  1  votes  for  acquittal  are  not  In  error. 
Terms  very  similar  to  these  two  terms  have  appeared  in  the  Laplace 
development  where  y  is  replaced  by  x  and  9  -  1/2. 

Define 


5 

^12  5  “  ^  Yj2  ^ 

’  i-0  ' 

ri2,4  "  Jo  Yl2,i 

and  these  are  the  probabilities  of  conviction  when  majorities  re¬ 
quired  for  conviction  are  7  or  more  out  of  12  and  8  or  more  out  of 
twelve  respectively.  Estimates  of  I*12  5  and  r^2  ^  can  be  secured 
from  the  French  data  and  thus  one  can  produce  two  equations  in  two 
unknowns,  namely  y  and  9.  In  1825-30,  if  a  conviction  wes  based  on 
exactly  seven  out  of  twelve,  another  court  intervened  and  thus  the 
number  of  such  cases  was  known  by  year.  Since  I^2  ^-r^2  ^  -  y12  5 
another  anchor  is  provided  to  check  on  the  model.  The  estimates  of 
Yj2  £  found  in  this  way  when  checked  with  the  empirical  values  re¬ 
inforced  the  use  of  the  model.  One  is  faced  with  two  equations  of 
high  degree  in  y  and  3  but  Poisson  had  some  lngenlc  <?  methods  for 
making  the  solutions  feasible. 
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Over  all  trials,  Poisson  obtained  the  estimates  6  »  .64, 
y  ”  .75;  for  crimes  against  persons,  the  estimates  are  6  -  .54, 
y  “  .68;  for  crimes  against  property,  the  estimates  are  6  -  .67, 

U  -  .78.  This  demonstrates  how  8  and  y  can  easily  vary  with  the 
criminal  charge.  Also,  while  we  treat  them  as  independent  variables 
this  is  necessarily  not  so,  and  in  fact,  y  can  also  vary  with  n,  the 
size  of  the  jury,  and  6  itself  could  in  some  societal  contexts  de¬ 
pend  on  n.  For  purposes  of  exposition  and  purposes  of  comparison, 
we  will  employ  8  -  .64  and  y  -  .75  since  felony  trials  in  the 
United  States  for  which  we  have  data  are  based  on  crimes  against 
both  persons  and  property. 

Poisson  is  quite  aware  of  the  complementary  nature  of  8  anil  y, 
namely  that  (1-8)  and  (1-y)  will  produce  the  same  probability  of 
conviction  in  his  model.  He  comments  that  the  high  proportion  of 
convictions  during  the  period  of  the  French  Revolution  can  not  be 
employed  to  suggest  fairness,  equity,  or  reasonableness  since  values 
8  -  .36  and  y  «  .25  yield  the  same  values  for  P  as  8  •  .64,  8  »  .75 
that  were  derived  from  his  model  and  the  data  of  1825-30,  1831-33. 
Thus  bringing  to  trial,  an  individual  whose  prior  probability  of 
guilt  is  about  1/3  where  jurors  can  be  in  error  3/4  of  the  time, 
gives  (in  the  seven  or  more  out  of  twelve  situations),  Pq  •  -81  just 
as  in  the  case  where  the  probability  of  juror  error  equals  1/4  and 
probability  of  guilt  before  trial  is  about  2/3.  If  we  assume 
y  _>  1/2,  8  _>  1/2,  then  the  y,  0  solutions  are  unique. 

Armed  with  the  results  of  his  model,  Poisson  proceeds  to  esti¬ 
mate  the  two  kinds  of  jury  errors.  For  the  period  1825-1830,  he 
estimates  the  probability  of  convicting  an  innocent  defendant  is 
.06  (over  person  and  property  crimes)  and  the  probability  of  acquit¬ 
ting  a  guilty  defendant  is  .18.  Poisson  gives  more  results  but  the 
figures  just  cited  will  suffice  to  provide  a  basis  for  comparison 
with  his  20th  century  successors. 

To  summarize,  the  Poisson  jury  model  seems  to  serve  the  French 
jury  experience  quite  well.  There  are  only  two  parameters  to  pro¬ 
duce  a  rather  parsimonious  accounting  of  French  jury  decisions  in 


the  period  1825-1933.  The  data  on  hand,  e.g.  proportion  of  convic¬ 
tions  by  7  or  more  out  of  12,  by  8  or  more  out  of  12,  and  by  exactly 
7  out  of  12,  permit  the  development  of  two  equations  with  two  un¬ 
knowns  under  the  Implicit  assumption  that  one  ballot  is  required. 
However  the  plurality  of  7  or  8  out  of  12  jurors  to  produce  a  ver¬ 
dict,  essentially  leads  to  only  one  ballot.  The  two  parameters,  8: 
the  probability  the  defendant  is  guilty  before  the  trial  begins 
and  evidence  is  presented,  and  y :  the  probability  a  juror  will  not 
make  an  error,  are  latent  parameters.  Estimates  of  these  parameters 
are  produced  from  the  data  on  proportion  of  jury  convictions. 

Poisson  computed  these  values  by  solving  equations  of  high  degrees 
and  he  essentially  employed  the  method  of  moments  to  obtain  these 
estimates.  The  equations,  of  course,  derive  from 

Yn,i  "  (i)[0vn"i(i-y)i  +  (i-8)yi(i-y)n‘i]  . 


The  American  Jury 

The  application  of  Poisson's  model  to  the  American  experience 
requires  modifications.  In  most  felony  trials  unanimity  is  requir¬ 
ed  and  there  are  12  lay  jurors.  If  an  Initial  ballot  does  not  pro¬ 
duce  unanimity,  jury  deliberations  take  place  until  unanimity  is 
achieved  or  there  is  a  hopeless  deadlock  in  achieving  this  goal. 
Therefore,  in  addition  to  y  and  0  we  require  some  modeling  of  the 
deliberative  process  leading  to  conviction,  acquittal,  or  stalemate 
If  the  Initial  ballot  does  not  reflect  unanimity.  If  initial  ballot 
American  data  is  available,  one  can  estimate  y  and  6  since  we  could 
consider  this  somewhat  analogous  to  the  French  jury  situation. 

While  much  jury  data  may  exist  In  raw  form  in  many  state  and 
federal  archives,  there  is  only  one  published  account  of  initial 
ballot  results  and  final  decisions  of  some  juries.  This  data  ap¬ 
pears  in  Kalven  and  Zeisel  (1966).  For  225  juries,  each  of  size  12, 
there  is  reported  the  votes  on  the  Initial  ballots  and  the  juries' 
final  decisions.  Table  3  lists  this  data.  It  will  permit  us  to 
obtain  estimates  of  u  and  6  from  initial  ballot  data  for  the  American 


seen*.  In  fact,  we  will  consider  p^:  probability  the  juror  will 
vote  guilty  given  the  defendant  is  guilty  and  probability  the 
Juror  will  vote  not  guilty  given  the  defendant  is  innocent,  that  is, 
p:  probability  the  juror  will  not  make  an  error,  is  sharpened.  We 
will  then  require  some  modeling  to  take  us  from  the  initial  ballot 
to  final  decision  and  the  data  in  Table  3  will  also  be  helpful  in 
this  regard,  although  we  jump  from  initial  ballot  to  final  decision 
in  one  step.  In  a  subsequent  section  we  will  discuss  going  from 
initial  ballot  to  final  decision  in  several  steps  but  will  be  ham¬ 
pered  by  the  lack  of  data  on  what  goes  on  in  the  American  jury  room. 

To  Include  and  p^  in  Yn  we  can  slightly  revise  y^  i  as 
it  appeared  in  the  previous  section  to  obtain 

yn,l  ’  (i>  + 

where  y  .  is  the  probability  that  a  jury  of  size  n  casts  i  votes 

for  acquittal  on  the  first  ballot.  We  also  have  T  .  ■  Z.  „  y  , 

n,i  J-l  ’n,j 

where  7  .  is  probability  of  at  most  i  votes  for  acquittal  on  first 

1  -  * 

ballot  and  we  can  define 


“,i 


(“jep^u-p)1 

vl 


where  p  .  is  the  probability  that  tha  accused  is  guilty  given  ex- 
n,  x 

actly  1  votes  for  acquittal  on  the  first  ballot.  Likewise 

Pn.l  '  JJ-0  ’*“•  Pn,l  11  th*  P'O'-biUty  ttat  th. 

accused  is  guilty  given  at  most  1  votes  for  acquittal  on  the  first 
ballot. 


Other  Estimation  Approaches  and  Extension  of  the  Model 

Let  us  consider  Table  3  in  the  following  way.  We  can  think  of 
the  first -ballot  results  for  the  225  trials  as  225  Independent  ob¬ 
servations  from  a  five-cell  multinomial  distribution.  Under  the 
two-parameter  model  the  cell  probabilities  are  p^  -  y^2  Q» 

p2  “  Zi-1  Y12,i»  p3  "  Y12,6*  p4  "  Zi-7  Y12,i»  *nd  p5  "  Y12,12 
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respectively,  while  under  the  three-parameter  model  the  cell  proba¬ 
bilities  denoted  by  primes  are  Identical  to  the  above  with  Y'  re¬ 
placing  Y.  Two  estimation  approaches  arise  naturally  from  such  a 

basis,  the  method  of  maximum  likelihood  and  the  method  of  modified 
2 

minimum  X  •  In  either  case  and  under  either  model  we  restrict  our¬ 
selves  to  solutions  for  the  parameters  on  the  interval  (1/2,1)  since 
it  is  difficult  to  believe  y  <  1/2,  6  <  1/2  in  American  society. 

In  the  case  of  the  maximum  likelihood  estimators  we  examine 

L(P,9)  “  c(pL)43  (p2)105  (p3>10  (pA)41  (p5)26 

over  the  range  1/2  <  ii  <  1,  1/2  £  6  <  1  where  the  exponents  are  the 
total  number  of  verdicts  for  each  acquittal  vote  category;  and 

L(y1,y2,6)  -  c(p')43  <p’)105  (Pp10  (Pp41  (P^)26 

over  the  range  1/2  £  y^  <  1,  1/2  £  y2  <1,  1/2  £  9  <  1.  The  unique 

solutions  are  y  »  .88,  9  -  .69  and  y^  *  .86,  y2  *  .92,  0  -  .70, 

respectively,  Gelfand  and  Solomon  (1977). 

2 

As  for  the  modified  minimum  X  estimators  we  minimize 

5  (0  -  225p  )2 

x  <u,9) '  Ji 


X  (ylty2,0)  - 


5  (0i-  225p^ 

i-1  ***1 


where  0^  ■  43,  02  -  105,  0^  *  10,  0^  ■  41  and  0^  «  26  and  the  ranges 
of  the  parameters  are  restricted  as  above.  The  unique  solutions 
are  y  ■  .84,  9  *  .66  and  y^  ■  .92,  y2  ■  .92,  9  -  .76,  respectively. 
The  results  of  these  two  estimation  procedures  along  with  estimates 
by  the  method  of  moments  are  displayed  in  Table  4  and  indicate  rea¬ 
sonably  good  agreement. 

Consideration  of  the  situation  in  terms  of  a  multinomial  dis¬ 
tribution  opens  the  possibility  of  a  wide  variety  of  extensions  of 
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TABLE  IV 

Estimates  of  the  Model  Parameters 

2-Parameter  Model  3-Parameter  Model 


Case  I:  Modified  2 
Minimum  X 

am 

U,  ■  .92 

M2-.92  9-'76 

Case  II:  Maximum 

Likelihood 

y  -  .88  0  -  .69 

y.  ■  .86 

02  -  .92  8  *  -70 

Case  III:  Method  of 
Moments 

am 

V,  -  .90 

V2  -.92  8  --“ 

of  the  basic  model  limited  only  by  the  availability  of  data.  For 

example,  starting  with  the  two-parameter  model  one  might  instead 

wish  to  think  of  juries  composed  of  fixed  numbers  of  men,  n  and 

-  m 

of  women,  nw  (nm+nw  *  ti)  or  perhaps  of  juries  composed  of  nfe  blacks 
end  nw  whites  (n^+n^  »  n) .  Instead  of  a  common  y  for  all  jurors, 
associate  a  v»m  end  yw  to  male  and  female  jurors  respectively  (si¬ 
milarly  for  blacks  and  whites) .  The  effect  of  such  additional  pa- 
rametrlzation  leads  to  consideration  of  I,  the  number  of  first- 
ballot  votes  for  acquittal,  as  the  sum  of  two  Independent  random 
variables  (l.e.,  the  number  of  male  votes  for  acquittal  plus  the 
number  of  female  votes  for  acquittal  and  similarly  for  blacks  and 
whites).  Thus  the  distribution  of  I  results  from  a  convolution, 
l.e.,  in  the  male-female  case 


min(n  ,i) 
ni 


From  this  example  it  is  obvious  that  additional  complexity 
can  be  inserted  into  the  model  and  that  the  three-parameter  model 
can  also  be  extended  similarly.  The  number  and  definition  of  the 


multinomial  cells  is  flexible.  Hence  with  appropriately  gathered 
first-ballot  data  and  effective  computer  programs  the  parametric 
estimation  possibilities  are  quite  broad.  At  the  present  time  the 
prospects  for  availability  of  such  data  as  described  above  are  'at 
best'  slim. 

In  examining  the  distribution  of  I,  the  number  of  first-ballot 
votes  for  acquittal,  ve  notice  that  under  either  the  two-  or  three- 
parameter  model  we  find  it  to  be  a  mixture  of  binomials.  In  fact 
under  the  two-parameter  model 

P(I*i)  -  yn>±  -  9P[l-i|l  *  BjL(n,l-u)  ]  +  (l-0)P[l-i|l  B±(n,y)  ] 

and  under  the  three-parameter  model 

P(I-i)  -  “  3P[l«i|l  ~  Bjfa.l-Uj)]  +  (l-9)P[l«i|l  -v  B^n.Vj)]  , 

where  B(n,p)  is  the  binomial  with  parameters  n  and  p.  Apart  from 
the  earlier  discussion  leading  to  these  models,  such  a  mixture  is 
ideal  for  describing  the  expected  bimodal  distribution  of  first- 
ballot  votes.  We  have  E(I)  -  n[0(l-y)  +  y(l-8)]  and 
nleU-Uj)  +  y2(l-0)]  respectively,  while  Var(I)  *  ny(l-y)  and 
n[0y^(l-yj)  +  (l-0)y2(l-y2) ]  respectively.  Under  the  range  of 
values  for  the  parameters  suggested  by  Table  4  and  under  either 
model  with  n  -  12,  E(I)  is  approximately  equal  to  four  and  Var(I) 
is  approximately  equal  to  one.  Since  Table  4  suggests  little 
difference  between  y^  and  we  shall  use  the  two-parameter  model 
(0,y)  and  its  estimates  for  the  remainder  of  this  exposition. 

The  values  for  0  and  y  that  we  have  computed  from  the  Kalven- 
Zeisel  data  result  from  initial  ballot  responses  for  American  ju¬ 
ries  of  size  12.  Since  juries  in  Poisson's  day  essentially  were, 
or  could  be  conceived  of,  as  one  ballot  juries;  19th  century  French 
and  20th  century  American  values  for  0  and  y  may  be  commensurate. 
The  American  0  is  a  bit  higher  and  the  American  y  is  quite  a  bit 
higher  than  their  French  counterparts.  Assuming  this  conclusion 
is  valid,  it  would  be  interesting  to  examine  the  difference  in  the 
y's  -  e.g.  is  the  American  juror  more  sophisticated?  As  for  0,  is 
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the  search  and  interrogation  process  and  the  public  climate  on 
crime  In  America  doing  a  better  job  in  bringing  miscreants  to 
trial.  This  can  be  somewhat  misleading  because  1)  the  difference 
in  0's  is  not  great,  and  ii)  defendants  who  go  to  jury  trial  are 
the  very  few  for  whom  plea  bargaining  has  not  been  successful,  the 
crime  is  serious  and  the  evidence  of  guilt  is  not  ironclad. 

In  order  to  provide  fully  for  an  American  jury  model,  it  is 
necessary  to  allow  for  jury  deliberation  if  the  initial  ballot  does 
not  yield  unanimity  for  conviction  or  acquittal.  Our  search  for 
conditional  probabilities  of  conviction  given  innocence  and  acquit¬ 
tal  given  guilt  must  rely  on  an  explication  of  the  full  model. 

From  the  Kalven-Zeisel  data  we  note  that  'majority  persuasion'  is 
taking  place  in  bringing  the  jury  from  initial  ballot  to  final  ver¬ 
dict.  The  number  of  votes  for  acquittal  on  the  initial  ballot 
seems  to  determine  the  outcome  except  for  some  Infrequent  rever¬ 
sals.  For  example,  for  1  to  5  votes  for  acquittal  on  the  initial 
ballot,  only  five  percent  of  the  time  is  there  a  final  verdict  of 
not  guilty;  for  7  to  11  votes  for  acquittal  on  the  initial  ballot, 
only  two  percent  of  the  time  is  there  a  final  guilty  verdict.  In 
the  former  situation  there  is  also  a  nine  percent  chance  of  a  hung 
jury  and  for  the  latter  situation,  there  is  a  seven  percent  chance 
for  a  hung  jury 

Let  us  suppose  that  the  first  ballot  majority  always  prevails 
and  in  the  case  of  an  evenly  split  first  ballot  there  is  an  even 
chance  which  way  the  decision  will  go.  The  latter  assumption  is 
borne  out  by  the  data  in  Table  3  but  there  only  ten  times  out  of 
225  cases  when  the  vote  is  evenly  split.  Under  such  assumptions, 
the  probability  of  conviction,  is 

v  1 

Pc  “  J0  Y12,i  +  2  Y12, 6 

with  P,  ■  1-P_  and  P„,  the  probability  of  a  hung  jury,  equal  to 


Then  the  probability  that  a  defendant  is  guilty  given  convic 


tion>  pG|c  13 


f  1 

An  P12,i  Y12,i  +  2  p12,6  Y12,6 


and  Pjjj,  ■  1-Pgij,.  Likewise,  the  probability  that  a  defendant  Is 
innocent  given  acquittal,  PjU*  is 


12 

1  U-P12  ±)  y12t±  +  J  p12,6  Y12,l 


employing  the  r.aximum  likelihood  values,  8  =  .69,  U  ■  .88, 
Gelfand  and  Solonon  (1975)  show  that  the  conditional  quantities 
Pg!a  and  Pl|c  are  a?Proximately  twenty  times  larger  for  a  jury  of 
size  six  than  for  a  jury  of  size  twelve.  Naturally  this  assumes 
the  values  of  0  and  y  remain  the  same  for  the  two  jury  sizes.  Als 
this  approximation  by  simple  majority  persuasion  is  somewhat  crude 
and  we  now  seek  to  better  this  model. 

A  next  step  in  refining  this  model  is  to  modify  the  pure  ma¬ 
jority  persuasion  aspect  by  the  Kalven-Zeisel  data.  For  example 
we  can  write 


PC  “  Y12,0  +  (,86)  J1  Y12 , i  +  ' 5Y12 , 6  +  ,02  Y12, i 

since  in  86  percent  of  the  trials  where  the  initial  ballot  had  1 
to  5  votes  for  acquittal  the  final  decision  was  a  guilty  verdict 
and  in  two  percent  where  the  initial  ballot  had  7  to  11  votes  for 
acquittal,  a  guilty  verdict  was  rendered. 

We  can  also  write 

11  5 

Pa  =  Yi?  i?  \  Yi o  j  c  ^  0,5  Y, _  . 


and  Chen 


P  ■  1-P  -P 
rH  A  rA  C 

obtaining  for  the  first  time  a  value  for  P  that  is  not  equal  to 

El 

zero.  These  also  lead  to  values  for  P^j^  and  P^jA  and  consequently 
Pj^  =  l-PGjc  and  Pq|a  m  1-Pj^.  Once  again  these  conditional  pro¬ 
babilities  are  shown  to  vary  considerably,  Gelfand  and  Solomon  (1975), 
between  juries  of  size  six  and  size  twelve. 

For  our  final  modification  of  the  model  in  going  from  initial 
ballot  to  final  verdict  we  employ  a  blend  of  theory  and  empirical 
evidence  from  mock  jury  data.  In  a  subsequent  section  we  try  our 
hand  at  taking  the  jury  through  several  ballots  from  initial  deci¬ 
sion  to  final  verdict.  For  our  nock  jury  data,  we  use  the  results 
of  studies  conducted  by  Davis  and  his  collaborators  who  develop  so¬ 
cial  decision  schemes  to  take  the  jury  from  initial  ballot  directly 
to  final  verdict.  One  such  scheme  we  employ  and  then  modify  is  by 
Davis  (1973) : 

Votes  for  Acquittal  on  First  Ballot 
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Note  that  in  the  Kalven-Zeisel  data  of  Table  3  we  are  restricted  to 
five  columns  because  the  initial  ballot  data  has  been  aggregated 
that  way.  Here  through  the  experimental  study  we  have  all  thirteen 
columns  for  acquittal  votes  on  first  ballot.  Note  also  that  major¬ 
ity  persuasion  is  exhibited  by  the  results  of  the  experimentation. 

To  incorporate  this  data  in  a  meaningful  way  we  return  to  our 
modeling.  We  first  consider  the  twelve  nember  jury.  Given  a  first 
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ballot  stance,  l.e.  number  of  votes  for  acquittal  on  the  first  bal¬ 
lot,  we  wish  the  probabilities  associated  with  each  of  the  three 
possible  jury  conclusions.  Label  these  three  probabilities  as 
PC(i)’  PA(i)*  PH(i)  where»  for  Instance,  Pc(i)  is  the  probability 
the  jury  ultimately  convicts  given  i  votes  for  acquittal  on  the 
first  ballot.  Obviously  for  any  fixed  value  of  i,  the  sum  of  the 
three  probabilities  is  one,  and  although  each  juror  has  two  choices, 
the  jury  has  three.  Employing  our  previous  notation,  we  write 


or  the  matrix  equation 

P  -  Dy 

«W 

where  P  is  a  column  vector  with  three  rows  or  equivalently 
P'  a  ^c*PA,PH^’  ^  a  nolnn11  vector  with  13  rows  or 

Y1  “  (Y 12  o,Y12  l’***’Y12  12^’  and  5  18  a  social  decision  matrix 
with  three  rows  and  thirteen  columns. 

If  we  now  evaluate  Y^2  i  for  usinS  ®  “  -69,  y  -  .88,  and 

employ  the  social  decision  scheme  from  Davis,  we  can  compute  P  and 
then  compare  the  three  coordinates  with  empirical  values  from 
Kalven-Zeisel.  Kalven-Zeisel’s  study  included  3576  jury  trials 
(for  only  225  were  initial  ballots  known)  and  for  these  we  have 
P'  ■  ( .642, . 303, .055) .  In  fact  for  the  225  trials  we  have 

A 

P'  ■  (.62, .32, .06) .  Both  P'  estimates  are  from  the  data. 

When  the  P'  coordinates  are  obtained  from  the  model,  the  re- 
~  2 

sultant  goodness-of-f it  statistic  is  X  "8.62  which  is  acceptable 
at  the  .01  level.  By  some  very  slight  adjustments  in  the  matrix  D 
supplied  by  Davis  (1973),  Gelfand  and  Solomon  (1977)  achieve 
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Number  of  Votes  for  Acquittal  on  Initial  Ballot 
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and  we  obtain  P*  “  (.64^9, .3024, .0557)  leading  to  x2  ■  .0841  (vir¬ 
tually  a  perfect  fit);  P'  are  estimates  from  the  model. 

Values  that  depart  from  9  *  .69,  ji  »  .88  yield  a  model  that 
does  not  reproduce  the  observed  P  vector.  This  suggests  that  we 
can  now,  within  this  model,  attempt  the  conditional  probabilities 
of  interest 


i.4 

Jq  PC(1)p12,iT12,i 

Jo  PC(i>Y12,i 


JQ  PA<i)(1”P12,i)Y12,i 

PI  \k  "  12 

I  PA^^12  i 

i-0  A  ,  1 

where  Pjjg  ■  l-*Pg|c  40,1  Pg|a  “  ^”PI  |A*  We  now  ®et*  a^ter  some  com¬ 
puter  calculations, 

PI|C  "  *0221 
PGiA  -  *0615  * 


if  we  now  try  this  approach  for  six  member  juries  there  are 
additional  problems  mainly  because  of  lack  of  data.  However  we  now 
attempt  ®nd  for  this  situation.  First  we  assume  6  and  u 

are  the  same  and  this  can  easily  be  challenged.  Then  we  propose  a 


social  decision  scheme  for  six  member  juries  embodying  majority  per¬ 
suasion  but  the  cell  entries  we  choose  can  also  be  easily  contested. 
However,  the  aforementioned  work  of  Davis  and  his  collaborators  has 
also  provided  similar  six  member  jury  decision  schemes  from  mock 
jury  experiments.  Employing  the  social  decision  matrix  given  below 
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we  obtain  P^,  ■  .6347,  P^  *  .3207,  P^  ■  .0446.  While  these  values 
are  very  close  to  what  experience  shows  with  12  member  juries,  it 
may  be  quite  different  for  six  member  juries.  On  the  other  hand, 
one  may  feel  quite  comfortable  with  replicating  what  has  been  going 
on  in  society.  In  either  event,  if  we  continue  we  obtain 
Pj|C  *  .0325  and  Pq|a  “  .1395.  This  demonstrates  quite  crucial 
differences;  a  six  member  jury  will  convict  50Z  more  innocent  de¬ 
fendants  and  will  set  free  twice  as  many  guilty  defendants.  Natu¬ 
rally  these  results  rely  on  the  assumptions  of  the  model  and  the 
employment  of  the  Kalven-Zeisel  data.  It  is  interesting  to  recall 
Poisson's  estimates  based  on  twelve  member  juries  and  majority  de¬ 
cisions  a  century  and  a  half  ago  in  France;  namely  **  >06  and 


It  may  also  be  instructive  to  look  at  some  empirical  results. 
Baldwin  and  McConville  (1979)  in  their  book  Jury  Trials  report  on 
twelve  member  jury  decisions  during  1975-76  in  Birmingham,  England, 
and  estimate  the  two  conditional  probabilities  of  error.  The  anchor 
here  as  to  what  the  'true'  situation  (convictable*  non-convictable) 
might  be  is  the  judges'  assessment  as  well  as  the  view  of  police. 


prosecution  and  defense  attorneys.  These  are  then  compared  with 
the  jury  decision  and  lead  to  >  .05  and  PG|A  >  .36.  The  value 

of  seems  quite  high.  Kalven  and  Zeisel  also  present  tables  of 

judge  and  jury  disagreement.  From  one  of  their  tables  (Table  12 t 
p.  58),  we  see  that  over  3,576  trials  the  jury  convicts  3  percent 
of  the  time  when  the  judge  would  acquit  and  the  judge  convicts  19 
percent  of  the  time  when  the  jury  acquits.  In  this  analysis  hung 
jury  trials  are  omitted.  The  estimates  of  jc  -  .03  and 
PG|A  "  .19  (assuming  the  judge's  verdict  is  the  truth)  may  be  con¬ 
trasted  with  the  other  values  just  quoted.  As  in  the  British  em¬ 
pirical  experience,  the  estimate  of  Pq|a  seems  rather  high. 

The  Poisson  model  we  have  modified  by  a  majority  persuasion 
decision  scheme  to  go  from  initial  to  final  ballot  will  yield  smal¬ 
ler  values  for  PjJq  and  Pq|a  as  the  jury  size  increases.  This  sug¬ 
gests  that  the  traditional  jury  of  size  twelve  be  increased  in  num¬ 
ber  to  lessen  the  two  risks.  This  could  be  more  costly  -  the  major 
argument  for  juries  of  size  six  instead  of  twelve  revolves  around 
cost  -  but  other  issues  not  yet  treated  would  require  consideration. 
Studies  of  group  behavior  Indicate  that  dead  time  and  poorer  per¬ 
formance  could  result  from  increases  in  group  size. 

There  is  some  literature  on  this  phenomenon  but  more  research 
would  be  required  on  juries  of  size  12  to  24  (roughly  the  grand 
jury  size  in  the  U.S.)  to  study  the  inhibiting  effects,  if  any,  of 
large  jury  sizes.  For  additional  discussion  of  mathematical  models 
of  jury  decision  making,  Grofman  (1981)  presents  the  state  of  the 
art  up  to  the  present.  His  paper  gives  a  detailed  account  of  vari¬ 
ous  models  including  the  Gelfand-Solomon  modification  of  the 
Poisson  development  featured  here. 

We  have  already  remarked  on  the  availability  of  jury  data. 

For  the  past  dozen  years,  data  on  number  of  jury  trials  and  jury  de¬ 
cisions  by  crime  in  the  U.S.  Federal  Courts  have  been  published  an¬ 
nually.  A  further  breakdown  would,  of  course,  be  helpful.  What 
appears  annually  now  is  exactly  the  kind  of  information  available 
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to  Poisson  except  that  the  categories  of  crime  enumerated  may  be 
■ore  numerous.  For  those  who  wish  to  develop  and  test  models  we  do 
not  seem  better  off  than  Poisson  in  connection  with  empirical  data 
although  a  body  of  mock  jury  data  Is  growing. 

The  Jury  from  Within:  Markovian  Models 

If  we  wish  to  consider  the  American  jury  from  Initial  ballot 
to  final  verdict  by  allowing  additional  balloting  we  will  not  have 
an  empirical  base.  In  what  follows  we  nevertheless  try  our  band  at 
this,  keeping  in  mind  the  liabilities  thus  imposed.  Klevorick  and 
Rothschild  (1979)  develop  another  multi-ballot  model  and  stress 
caution,  as  we  do,  because  of  the  simplifying  assumptions  employed. 

In  Penrod  and  Hastie  (1979),  there  Is  some  discussion  of  multi¬ 
ballot  models.  Much  of  what  follows  appears  in  an  unpublished  re¬ 
port  by  Gel f and  and  Solomon  (1974). 

Let  us  consider  from  ballot  to  ballot  how  the  jury  ultimately 
arrives  at  a  decision.  Given  the  paucity  of  data  on  behavior  in 
the  jury  room,  this  presents  a  formidable  estimation  problem  be¬ 
cause  of  the  Increase  in  the  number  of  parameters  in  such  a  model. 
From  a  stochastic  point  of  view,  we  wish  to  develop  both  stationary 
(homogeneous)  and  non-stationary  (non-homogeneous)  Markov  models 
with  appropriate  transition  matrices,  whose  entries  give  the  pro- 
bablllty  of  j  votes  for  acquittal  on  the  n+1  ballot  given  1  votes 
for  acquittal  on  the  nth  ballot.  Let  us  denote  this  probability  by 
p^(j |i) .  The  Markovian  assumption  seems  quite  reasonable,  but  an 
assumption  of  stationarlty  most  likely  Is  not.  This  extended  model 
can  be  modified  to  obtain  fewer  states  by  grouping  ballot  outcomes, 
thus  requiring  estimation  of  fever  parameters.  Ultimately  we  shall 
do  this,  but  for  now,  regardless  of  the  number  of  states,  it  is 
critical  to  note  that  the  states  where  1-0  (i.e. ,  all  guilty  votes) 
and  1  ■  12  (i.e.,  all  non-guilty  votes)  are  absorbing  barriers. 
Moreover  the  chain  has  a  finite  state  space,  and  it  Is  reasonable 
to  postulate  that  for  any  intermediate  state  an  absorbing  state  is 
accessible.  Hence  the  assumption  of  a  stationary  transition  matrix 
suggests  all  other  states  must  be  transient.  This  leads  to  the 
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rather  unsatisfactory  assumption  that  all  juries  arrive  at  either  a 
guilty  or  innocent  verdict,  i.e.,  there  is  no  possibility  of  a  hung 
jury  and  so  some  modification  is  required  to  establish  this. 

Actually  we  are  overstating  the  situation  in  the  sense  that 
the  remarks  above  imply  that  the  probability  of  remaining  in  a 
'transient'  state  goes  to  zero  as  the  number  of  ballots,  n,  goes  to 
«°.  Since  for  finite  n  there  will  be  positive  probability  of  not 
being  absorbed  as  yet  (i.e.,  'hanging'),  perhaps  it  is  just  a  ques¬ 
tion  of  deciding  on  a  finite  number  of  ballots  for  the  jury  to  a- 
chieve  a  decision  or  declare  themselves  deadlocked.  That  is,  given 
enough  ballots,  unanimity  would  be  reached  or  we  could  insure  that 
say,  5X  of  the  time  the  jury  is  hung.  For  majority  verdicts,  the 
percentage  of  hung  juries  should  be  less  than  5Z. 

However,  one  could  make  a  stronger  argument  for  a  nonstation¬ 
ary  structure  as  follows.  On  the  early  ballots  the  number  of  votes 
for  acquittal,  i,  may  change  quite  a  bit,  but  after  a  few  ballots 
the  jurors  begin  to  'lock  into  a  position*  after  which  the  jury 
stance  will  change  perhaps  a  vote  or  not  at  all.  Thus  the  transi¬ 
tion  matrix  cannot  be  stationary  but  in  fact  should  be  tending  to 
an  Identity  matrix,  i.e.,  all  states  ultimately  becoming  absorbing. 
Therefore  even  given  an  infinite  number  of  ballots  the  jury  would 
not  necessarily  achieve  a  unanimous  position.  Since  this  is  some¬ 
what  tentative  and  exploratory,  we  shall  examine  both  models. 

Let  us  first  consider  a  stationary  Markov  setting.  We  des¬ 
cribe  a  transition  matrix  P  -  {p.y}  where  p^  =  Pn(j|i)  •  p(j|i) 
(i.e.,  is  independent  of  n) .  Just  as  we  have  done  for  the  initial 
distribution  of  votes  for  acquittal  we  shall  describe  the  condi¬ 
tional  distribution  of  votes  for  acquittal  on  the  present  ballot 
given  i  votes  for  acquittal  on  the  previous  ballot  by  a  mixed  bi¬ 
nomial  where  the  parameter  values  depend  on  i.  That  is,  for 

i  -  o,  i . 12,  Pij  -  ^2fjlv1(i),y2(i),e(i)],  j  -  0,  1 . 12 

Actually  we  will  set  y^(i)  ■  y2(l)  ■  y(i)  for  convenience  so  that 
p^j  ■  j [y(i),9(i)l •  We  note  that  these  assumptions  character- 
rize  each  row  of  P  by  two  unknown  parameters  instead  of  requiring 


13  (actually  12)  parameters,  thereby  simplifying  matters  considera¬ 
bly  .  As  long  as  a  stationary  transition  structure  is  assumed  such 
conditional  distributions  for  appropriate  V(i)  and  6(i)  seem  satis¬ 
factory. 

Although  no  actual  data  is  available  on  the  average  number  of 
ballots  to  achieve  a  hung  jury,  Kalven  and  Zeisel  (1966,  pp.  458-459) 
report  that  in  nearly  80Z  of  trials  resulting  in  hung  juries  the 
deliberation  time  is  between  two  and  ten  hours.  Thus  we  might  guess 
at  least  five  to  at  most  20  ballots  will  be  taken  with  perhaps  a 
median  around  ten.  Naturally  this  is  purely  speculative.  Our  goal 
will  be  to  select  y(i)  and  6(i)  such  that  II  =  (P)n  for  n  approxi¬ 
mately  10  will  be  a  good  approximation  to  D  as  suggested  and  modi¬ 
fied  in  Gelfand  and  Solomon  (1975,  1977)  respectively.  In  other 
words  the  first  column  vector  of  II  should  approximate  the  first  row 
of  D,  the  last  column  vector  of  H  should  approximate  the  second  row 
of  D  and  the  sum  of  the  remaining  column  vectors  of  II  would  yield  a 
vector  that  approximates  the  last  row  of  D. 

In  selecting  y(i)  and  8(i)  we  first  observe  that  Table  5  sug¬ 
gests  suitable  choices  for  6(1)  are 

6(i)  -  1  i  -  0 . 5 

6(i)  -  0  i  -  7,. ..,12 

S(i)  -  2/3  i  -  6. 

This  follows  since  the  value  p^  -j_  is  effectively  the  appropriate 
choice  of  6  to  use  given  1  votes  for  acquittal  on  the  first  ballot. 
Moreover  given  the  first-ballot  position,  the  transition  distribu¬ 
tion  would  probably  be  unimodal,  i.e.,  0(i)  -  0  or  1.  Due  to  the 
essentially  symmetric  structure  of  D  we  set  y(i)  -  l-V(12-i)  for  all 
1,  with  y(6)  ■  1/2  and  then  experiment  over  various  choices  of  U(l)» 
i  ■  1,  ....  5.  A  rather  satisfactory  fit  was  achieved  for 
p(i)  -  l-(.08)i+.02  as  indicated  by  Table  6.  Only  at  i  ■  6  is  the 
fit  poor,  and  as  observed  earlier  the  effect  will  be  insignificant. 
The  fit  can  be  somewhat  refined  but  to  no  particular  advantage.  In 
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any  case  the  modeling  formulation  should  be  clear.  The  approxima¬ 
tion  displayed  In  the  lower  half  of  Table  6  can  itself  be  taken  as 
a  social  decision  scheme  and  thus  considered  In  terms  of  how  well 
it  fits  the  Kalven  and  Zeisel  data  in  a  manner  analogous  to  pre¬ 
viously  examined  schemes.  In  particular  for  Case  II  of  Table  5  we 

obtain  an  expected  distribution  vector  P  -  (.6355, .2976, .0669) 

”  2 

which  when  compared  with  the  observed  vector  P  yields  a  x  value 
of  6.71.  This  is  remarkably  small  In  view  of  the  crudeness  of  our 
assumptions. 

In  examining  a  nonstationary  Markovian  approach  the  technique 
just  described  can  not  work,  for  if  p(i)  and  9(1)  are  now  allowed 
to  depend  on  n,  there  will  be  no  way  to  select  them  such  that 
Pn(i|i)  increases  to  1  as  n  Increases,  i.e.,  such  that  the  condi¬ 
tional  distributions  at  each  i  value  will  tend  to  a  degeneracy  at 
that  value.  More  elaborate  specification  will  be  required  for  each 
i,  considerably  complicating  the  situation.  Thus  we  shall  examine 
instead  the  collapsed  model  described  by  Table  3  where  the  number 
of  states  is  reduced  to  five.  Specifically  we  will  define  the 
states  as  S, :  i  ■  0,  S0:  i  -  1,  2,  3,  4,  5,  S, :  i  ■  6, 


S^:  i  -  7,  8,  9,  10,  11  and  S^:  i  ■  12.  Note  that  this  interpre¬ 
tation  is  forced  upon  us  by  Table  6.  Otherwise  we  might  choose 
SjS  i  ■  1,  2,  3,  4,  i  ■  4,  5,  6  and  S^:  i  ■  8,  9,  10,  11  to 
better  separate  the  states.  Let  us  denote  the  transition  matrix 
from  the  ntl*  to  the  n+lst  ballot  by  Q^,  a  five-by-five  matrix  where. 
In  analogy  with  the  larger  situation,  we  have  q^jli)  as  the  entries 
in  At  this  point,  if  stationarity  were  assumed,  then  as  in  the 

13  state  model,  we  get  ■  Q  and  ^(jji)  ”  q(j|i)  "  q^ .  A  rea¬ 
sonable  form  for  Q  is 


r*  1 

0 

0 

0 

0  ^ 

qd|2) 

q(2|2) 

q(3 1 2) 

q<4|2) 

0 

Q  - 

0 

q(2|3) 

q<3 i 3) 

q(4 | 3) 

0 

0 

q(2|4) 

q(3|4) 

q(4|4) 

q(5|4) 

_  0 

0 

0 

0 

1  - 

Specifying  three  elements  in  Row  4,  two  elements  Row  3,  and  three 
elements  in  Row  4  we  can  proceed  exactly  as  before  in  an  effort  to 
approximate  the  decision  pattern  contained  in  Table  3.  Again  we 
can  achieve  a  satisfactory  fit. 

Let  us  now  turn  to  the  nonstationary  case.  In  particular,  let 
us  suppose  the  sort  of  non -homogeneous  behavior  described  earlier; 
that  is,  as  a  result  of  the  first  few  ballots  there  may  be  a  con¬ 
siderable  change  in  the  jury  stance,  but  then  the  situation  stabi¬ 
lizes  and  at  most  one  juror  will  change  his  vote.  Hence  we  assume 

(i)  qn(l|l)  -  1,  q^iiD  -  0,  i  -  2, 3,4,5  for  all  n 

qn(5|5)  -  1,  qn(i|5)  -  0,  i  -  1,2, 3, 4  for  ali  n 

(ii)  qn(i|2)  -  0,  i  -  3,4,5  for  n  >  3 

<^(1(4)  ■  0,  i  ■  1,2,3  for  n  >  3 

(iii)  qn(i}3)  -  0,  i  »  1,3,5  for  all  n. 


If  in  assumption  (ii)  n  _>  3  seems  too  early  for  such  stability, 
it  is  easy  to  adjust  what  follows  for  a  bit  larger  n.  Assumption 
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(ill)  becomes  Implicit  beyond  such  an  n  as  specified  by  (ii)  and  is 
not  unreasonable  nor  critical  for  smaller  n  in  view  of  the  infre¬ 
quency  of  occurrence  of  S3  and  the  initial  instability.  Again  it 
can  be  modified  if  desired.  Let  us  define 

Q(C|2)  -  P (convict ion  given  state  S2  on  first  ballot) 

Q(A|2)  *  P (acquittal  given  state  S2  on  first  ballot) 

Q(C|A)  -  P(conviction  given  state  on  first  ballot) 

Q(A|4)  -  P (acquittal  given  state  S^.  on  first  ballot) 

Under  (1).  (ii),  (iii)  we  determine 

Q(C|2)  -  q1(li2)+q1(2!2)q2(l|2)+Utq1(3|2)q2(2|3)+q1(2|2)q2(2|2)) 
Q(A|2)  -  Vqi(3i2)q2(4i3) 

Q(C'|4)  -  Uqi(3|4)q2(2|3) 

Q(A|4)  -  q1(5|4)+q1(4|4)q2(5l4)+VTq1(3i4)q2(4i3)+q1(4|4)q2(4|4)] 
where 

»  j-1 

U-q(l|2)+  l  q  (1|2)  ][  q.(2 12) 

j-4  J  i-3  1 

»  j-1 

V  -  q3(5|4)  +  l  q  (5|4)  £  q.(4|4)  . 

j-4  J  i-3  1 

In  the  spirit  of  our  previous  discussion  we  let 
qn(l|2)  -  anqi(l|2),  qn(5|4)  -  ^(5(4),  0  <  a  <  1,  0  <  8  <  1, 
with  q^(3|2)  -  q2 (3 | 2)  and  qx(3 |4)  -  q2(3|4)  and 
1/4  <  q1(2|3)  »  q2(2  1 3)  £  3/4.  Our  goal  is  to  fit  the  estimates 
suggested  by  Table  3,  namely,  Q(c|2)  -  .3t._  Q(a|2)  -  .05, 

Q(C|4)  -  .02,  Q(A |4)  -  .91.  After  some  numerical  experimentation, 
the  next  fit  was  observed  to  be  in  the  vicinity  of  the  following 
parametric  values:  q1(l|2)  -  .80,  a  -  .75,  q^(5  J 4)  ■  .75, 

8  -  .80  with  q1(3j2)  -  .15,  q1(3|4)  -  .15  and  q^(2 |3)  -  1/3.  For 


these  plausible  values  we  obtain  Q(c|2)  -  .861,  Q(a|2)  -  .044, 
Q(C|4)  **  .021,  Q(a|4)  -  .891  Indicating  a  surprisingly  good  and 
rather  satisfactory  fit  to  the  data. 

In  concluding  this  section  it  is  necessary  to  state  again  that 
our  effort  for  the  multi-ballot  model  has  been  to  develop  credible 
exploratory  models  to  describe  two  aspects  of  jury  behavior  -  the 
overall  decision-making  process  given  the  initial  ballot  and  the 
ballot-to-ballot  transitions  that  occur  along  the  way  to  malting 
jury  decisions.  Additional  analysis  is  required  along  these  lines 
but  the  lack  of  data  makes  this  a  somewhat  esoteric  exercise. 
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Laplace  is  discussed  and  the  Poisson  model  of  the  1830's  is 
highlighted.  The  latter  is  modified  to  analyze  the  American 
jury  experience  of  the  20th  century.  Recent  U.  S.  Supreme 
Court  decisions  in  the  1970's  on  jury  size  and  jury  decision¬ 
making  have  created  a  resurgence  of  interest  especially  on  a 
comparison  of  six  member  and  twelve  member  juries.  Some  com¬ 
parisons  of  size  in  terms  of  probabilities  of  errors  in  verdicts 
are  presented. 
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